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In higher education, assessment is more challenging than many would prefer to admit 
(Bloxham, Boyd, & Orr, 2011; Yorke, 2008), and these challenges are, if anything, even more 
profoundly experienced in the area of work-integrated learning (WIL). The field of WIL 
encompasses a diffuse range of on- and off-campus learning experiences, either in the 
workplace or in simulations of it, designed to enable students to integrate academic theory 
and workplace practice (Jackson, 2013). As Wilton (2012) pointed out, successful WIL 
experiences are seen by employers to offer key points of differentiation between graduates. 

Judgments of student success arise from processes of assessment, and 'assessment' is taken 
here to mean the appraisal of student work in order to make a judgment of performance 
(Sadler, 2005). In the context of WIL, assessment might be conducted by those within the 
higher education institution or those external to it and this is an important point which will 
be picked up later. In either case, appraisals of student performance are increasingly made 
with respect to learning standards, which may be explicitly and/or implicitly defined. 
Learning standards are essentially key reference points that describe what students should 
know or can do (Bloxham et al., 2011; Price, 2005; Sadler, 2007). 

Despite the significant attention directed towards assessment and learning standards by 
universities, it remains a challenging and contentious field. Students commonly identify 
assessment as an area in need of attention in their evaluations of taught programs (Jessop, El 
Hakim, & Gibbs, 2014; Scott, 2005; Williams & Kane, 2008). This sentiment is also reflected in 
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institutional audits of quality. For example, Ewan (2009) in her thematic analysis of audits 
conducted by the Australian Universities Quality Agency noted that approximately a fifth of 
the audit recommendations for universities revolved around the need to improve or develop 
consistent assessment policies and practices within their institutional quality frameworks. 
More recently, echoing a broader point made by Yorke (2011), Natoli, Jackling, Kaider, and 
Clark (2013) described how the assessment practices in the discipline of accounting lagged 
behind their particular institution's new WIL policy. Some of the reasons for this were 
ascribed to problems of resourcing; others related to the way the policy authors and 
academic staff had different interpretations of some of the policy terminology (such as what 
was meant by a 'simulated environment'). Natoli et al. (2013) concluded by calling for 
university staff professional development and capacity building in the domain of WIL, 
coupled with more effective communication regarding the university's quality policy. 

Conceptions of quality remain multiple, malleable, and highly contested. Harvey and Green 
(1993) offered a seminal definition of quality, suggesting that it could be described in any or 
all of the following terms: excellence; perfection; fitness for purpose; value for money; and 
transformation. Traditional conceptions of quality in higher education as 'excellent 
standards' (Vidovich, 2001) have much in common with Harvey and Green's first 
classification. Such standards may be maintained (and advanced) through processes of 
quality improvement, and proven through processes of quality assurance. Vidovich (2009) has 
suggested that the balance of power has, over time, shifted away from 'improve'(internally 
oriented critical reflections) and towards the 'prove' (externally oriented accountability), an 
observation also made elsewhere (Ranson, 2003; Reid, 2009). A heightened focus on 
accountability is evident in quality policy discourses on a global scale, and these, it is argued 
here, set an important context for the assessment of work-integrated learning. 

GLOBALIZATION AND THE RISE OF QUALITY POLICIES 

Higher education is increasingly seen as the engine driving economic productivity in a global 
knowledge society, and many countries have set ambitious participation targets for higher 
education (Bradley, Noonan, Nugent, & Scales, 2008). However, the transition from elite to 
mass participation in higher education has augmented the cost burden on nation states, 
raising questions of quality and triggering concerns over value for money. The Organisation 
for Economic Cooperation and Development (OECD), in its landmark report titled Tertiary 
Education for the Knowledge Society identified a number of global trends in quality assurance, 
recognizing the adoption of new public management approaches across a number of OECD 
governments with an emphasis on "leadership principles, incentives and competition" 
(Santiago, Tremblay, Basri, & Arnal, 2008, p. 260). Here, quality assurance was linked to 
economic growth through the signals sent to labor markets by high quality graduates who 
are able to participate and compete in a global marketplace. Contemporary quality policies 
seek to assure learning standards in a globally competitive context, and to provide a level of 
protection for higher education consumers (students, employers) from substandard 
providers. 


Asia-Pacific Journal of Cooperative Education, Special Issue, 2014,15(3), 225-239 


226 



YORKE, VIDOVICH: Quality policy and the role of assessment in work-integrated learning 


The OECD initiated a major project titled Assessment of Higher Education Learning Outcomes 
(AHELO) in January 2010. Within this three year project, 248 institutions in 17 countries took 
part (OECD, 2012). This included the U.S. and Australia as full participants, with England 
taking part as an observer. AHELO sought to develop and apply standardized testing 
instruments to evaluate generic skills (critical thinking, analytical reasoning, problem solving 
and written communication) and some discipline-specific skills within the fields of 
economics and engineering. Contextual information regarding the background and learning 
environments was also collected as part of the project, which was designed to support 
processes of international benchmarking (OECD, 2011), in a similar way to the OECD's 
Program for International Student Assessment (PISA) benchmarking mechanism in the 
schooling sector. Significantly, AHELO focused on the 'academic' outcomes outlined above, 
leaving a number of aspects of WIL unexplored, and potentially marginalized. Even though 
AHELO analyses were confined to a limited subset of indicators in the generic skill strand, it 
was determined that "the instrument would require further consultation to provide evidence 
of content validity" (OECD, 2013a, p. 30) before it could be used as a basis for international 
comparisons. The final volume of the feasibility study report concluded that there was an 
ongoing need for AHELO type data, but the national costs were not known and "more data 
and analysis was still to be gained from the feasibility study" (OECD, 2013b, p. 45) before 
making any decision to proceed with a full scale implementation. 

The growing focus on quality and standards, arguably forged by the OECD and other 
international organizations and networks, is evident in policy developments in countries 
across the globe, and in subsequent paragraphs these trends are discussed specifically in 
relation to the U.K., Europe, and U.S., before presenting the Australian policy context in the 
following section. 

The U.K. has had a long standing interest in the quality and comparability of academic 
qualifications across and beyond the sector. Early work conducted by Johnes and Taylor 
(1990) reported a number of differences between institutional degree classifications in an 
attempt to 'benchmark' or compare standards across institutions. In 1999, 'subject 
benchmark statements' were brought to prominence following a landmark review of U.K. 
Higher Education titled Higher Education in the Learning Society (National Committee of 
Inquiry into Higher Education, 1997). Subject benchmark statements describe the threshold 
and typical sets of attributes, skills and capabilities that a graduate of a particular discipline 
would be expected to have. These subject benchmark statements, setting out broad 
expectations for graduates of the discipline, were developed in response to challenges about 
standards. It is pertinent to note that these were originally termed benchmark standards, 
however, by the time the first benchmarks were published in May 2000 they had been 
relabeled as benchmark statements, a move which prompted comment that this "change 
recognized the failure of the process to clearly define explicit standards for all subjects" 
(Rust, Price, & O'Donovan, 2003, p. 148). These statements are not intended to specify a 
detailed curriculum or favor particular assessment approaches. Instead, they are intended to 
inform and assist those involved in program design and review through an established 
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consensus on the nature of standards in that discipline (Quality Assurance Agency for 
Higher Education, 2010). 

In Europe, the signing of the Bologna Declaration in 1999 by ministers from 29 countries 
signaled a commitment to work towards comparable degree specifications with broad 
outcomes defined at bachelor, masters and doctoral levels (European Higher Education Area, 
1999). By 2000, the European 'Tuning' project was linking the Bologna declaration to 
activities in the education sector, ultimately proceeding to produce a 'tuning process' and a 
set of associated tools (Tuning Management Committee, 2006) to enhance inter-national 
alignment of degree standards. For example, the Competences in Education and Recognition 
Project produced A Tuning Guide to Formulating Degree Qualification Profiles, which included 
references to program learning outcomes, the purpose of which was to "describe accurately 
the verifiable learning achievements of a student at a given point in time" (Lokhoff et al., 

2010, p. 22). 

In the U.S., an increased focus on learning outcomes, their measurement and their 
comparability was also becoming more apparent at the turn of the millennium. In the school 
sector the enactment of No Child Left Behind in January 2002 had introduced standardized 
testing for all students on an annual basis, with severe measures for those schools that failed 
to demonstrate adequate yearly progress (Hursh, 2008). In the higher education area, the 
Spellings Commission published A Test of Leadership: Charting the Future of U.S. Higher 
Education, calling for increased accountability and suggesting that "higher education 
institutions should measure student learning using quality assessment from instruments 
such as, for example, the Collegiate Learning Assessment" (U.S. Department of Education, 
2006, p. 24). 

The Collegiate Learning Assessment had been released in 2002 by the Council for Aid to 
Education (a not-for-profit educational foundation) in order to measure critical thinking, 
analytical reasoning and written communication using a standardized test. However, this 
test does not include broader WIL related aspects such as teamwork, oral communication, 
civic engagement, ethical reasoning, or intercultural knowledge and competence. Partly in 
reaction to what were perceived as 'narrow' quality indicators and partly in anticipation of 
increased accountability foreshadowed by the Spellings Commission, the Association of 
American Colleges and Universities embarked on a three year mission in 2007 to develop 
Valid Assessments of Learning in Undergraduate Education. A key outcome of this project was to 
collaboratively develop and agree a set of detailed standards for 15 essential learning 
outcomes, including those associated with the assessment of WIL, expressed at the level of 
the graduate (Association of American Colleges and Universities, 2006). More recently, the 
American Lumina Foundation launched the Degree Qualifications Profile (DQP), which 
"illustrates clearly what students should be expected to know and be able to do once they 
earn their degrees" (Lumina Foundation, 2011, p. 1). The DQP describes five areas of 
learning, which are "Broad, Integrative Knowledge; Specialized Knowledge; Intellectual 
Skills; Applied Learning, and Civic Learning" (Lumina Foundation, 2011, p. 4). 
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In all, over a number of years, U.K., European and American higher education quality 
policies and various stakeholders have given increasing attention to learning standards and, 
in particular, their comparability across disciplines, institutions, and regions. More recently 
this has extended to include international comparisons through the work of the OECD (2012, 
2013a, 2013b). In many settings there is an interest in furthering WIL (the American DQP is a 
good example of this). However, when it comes to the quality assurance of learning, 
standards tend to be more narrowly defined and these often neglect aspects of WIL. This is a 
particularly ironic state of affairs given that quality policies aspire to improve employability 
and advance economic productivity. These evolving policy developments are also readily 
recognizable in the Australian context, as outlined in the next section. 

QUALITY POLICY DEVELOPMENTS IN AUSTRALIA 

In Australia, the focus on 'quality' in higher education sharpened in the early 1990s with 
Australia's first official quality policy (Baldwin, 1991). This policy largely constructed 
quality in terms of 'excellent standards' in universities, focusing on institutional processes. 
This view was to gradually transition towards discourses of quality assurance (Vidovich, 
2001). In 1999, Minister Kemp pointed to the inability to compare Australian standards with 
other countries (Kemp, 1999), foreshadowing heightened interest in international 
competitiveness in higher education in the first decade of the new millennium. By early 
2008, following the election of an Australian Labor Government, the Minister for Education 
(Gillard) initiated a major review of the sector, under the aegis of a panel chaired by former 
Vice-Chancellor Denise Bradley with a remit to examine whether higher education was 
"meeting the needs of the Australian community and economy" (Bradley et al., 2008, p. 205) 
in the international marketplace. The final report of the panel, titled Review of Australian 
Higher Education and known as the 'Bradley report', was released in December 2008, 
heralding significant change and a renewed emphasis on quality outcomes from the higher 
education sector. It highlighted the need for Australia to invest to increase the proportion of 
the population with a bachelor degree, and argued that stronger accountability was needed 
to improve and assure the quality of those graduates. Underpinning these conclusions lay a 
belief that Australia was losing ground against other countries, thereby being positioned at a 
competitive disadvantage globally. The Bradley report ultimately made 46 
recommendations to the Federal Government (Bradley et al., 2008), and in the context of 
WIL, Recommendation 23 is particularly relevant. Recommendation 23 firmly positioned 
learning standards within the umbrella of the new quality assurance arrangements, arguing 
that "a set of indicators and instruments to directly assess and compare learning outcomes" 
was needed, together with "a set of formal statements of academic standards by discipline 
along with processes for applying those standards" (Bradley et al., 2008, p. 137). 

The Australian Government accepted the majority of Bradley's recommendations, and in 
2009 a landmark policy framework was put forward. Positioned as an overarching ten year 
vision. Transforming Australia's Higher Education System articulated a comprehensive response 


Asia-Pacific Journal of Cooperative Education, Special Issue, 2014,15(3), 225-239 


229 



YORKE, VIDOVICH: Quality policy and the role of assessment in work-integrated learning 


to the Bradley review, ushering in what was labelled "A New Era of Quality in Australian 
Tertiary Education" (Australian Government, 2009, p. 31). 

In the four years that have followed the release of Transforming Australia's Higher Education 
System, various Government policy consultations have sought to establish the direct 
measures of learning proposed by Bradley et al. (2008), but these moves were vigorously 
resisted by the sector. Proposals to reward universities for the quality of learning outcomes 
via performance funding were challenged, and eventually dropped. Various standardized 
tests of learning standards were proposed as performance indicators, such as the Graduate 
Skills Assessment and the Collegiate Learning Assessment (from the U.S.), but these 
suggestions were met with considerable challenge by a sector that saw these indicators as 
flawed, narrowly defined, and irrelevant to the Australian context. Proposals for 
performance indicators of learning standards were finally dropped following another round 
of consultation (Advancing Quality in Higher Education Reference Group, 2012). In their 
place, and to assure that the generic skills of graduates were able to meet the needs of 
employers, the reference group proposed a literature review and scoping study to investigate 
the feasibility of an employer satisfaction survey. 

During this period, a number of projects relating to learning standards emerged. The former 
Australian Learning and Teaching Council (ALTC) initiated the Learning and Teaching 
Academic Standards project, which drew together disciplinary groupings, led by 'Discipline 
Scholars', who were experienced academics at a professorial level (ALTC, 2010). The key 
deliverable for Discipline Scholars was the production of a document containing 'Threshold 
Learning Outcomes', as a step towards articulating the minimum standards for graduation in 
that discipline. Other projects emerged with a view to verifying standards through processes 
of external comparison, such as the Quality Verification System (Group of Eight, 2011). 

In 2013, a new consultation was released by the Higher Education Standards Panel, an expert 
advisory body established to provide independent advice to the Minister(s) responsible for 
tertiary education. This consultation sought comment from the sector on a new proposal that 
abandoned quantitative performance measures of learning standards in favor of judgment- 
based processes combined with periodical peer review (Australian Government, 2013). This 
approach would represent a policy turn that could potentially create an environment less 
hostile to the assessment of WIL. However, at the time of writing (December 2013), this 
policy discussion was ongoing and no decision had been reached. 

Taken together, these international and Australian quality policy developments set an 
important context for the assessment of WIL. The discussion so far indicates that quality is 
becoming increasingly defined with an eye to international competitiveness, based on 
performance indicators that are often narrowly defined. There has been significant 
contestation regarding the assurance of learning standards within the sector more broadly, 
but policy deliberations to date have only marginally addressed issues specific to the 
assessment of WIL. Governments in the U.K., U.S. and Australia have struggled to establish 
'measures' of learning in formal higher education institutions where the terrain is 
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comparatively well defined and controlled. In the diverse and less centrally controlled 
domain of WIL the issue of quality and standards becomes even more difficult. The 
following section explores some of the reasons why quality assurance of assessment in the 
WIL context is particularly challenging. 

LEARNING STANDARDS, QUALITY AND WIL 

The assessment of student performances in the light of learning standards is increasingly 
seen as central to institutional quality, but the problems associated with standards-based 
grading decisions are long-standing (Bloxham et al., 2011; Newstead & Dennis, 1994; Price, 
Carroll, O'Donovan, & Rust, 2011; Woolf, 2004; Yorke, 2008). Bloxham et al. (2011) suggested 
that there are two oppositional theoretical frameworks for assessment at the heart of 
arguments regarding learning standards. On the one hand, positivist views of assessment 
suggest that standards can be objectively defined with precision, and that achievement can 
be reliably measured against those standards. On the other hand, interpretative views see 
standards as those broader, tacit, normative, and consensually established judgments, or 
'rules of the discipline'. There is significant contestation between these opposing positions 
with proponents of detailed exposition of learning standards arguing for approaches more 
consistent with specification and measurement, whilst others at the opposite end of this 
continuum call for holism and judgment. The following sections briefly address three key 
challenges for assessment in a WIL context. These challenges relate to the difficulty of 
defining and specifying learning standards; the problems with applying those standards to 
consistently appraise work through grading practices; and the differences in the way results 
from WIL and non-WIL contexts are treated. Each is examined in turn. 

DEFINING AND SPECIFYING LEARNING STANDARDS 

In the OECD's Tertiary Education for the Knowledge Society, Santiago et al. (2008, p. 312) took 
up the link between learning outcomes and quality, asserting that "in the absence of objective 
measures of learning outcomes, there is no way for students to judge the quality of TEIs 
[tertiary education institutions] except by reputation". However, these objective measures of 
learning standards may be somewhat elusive, and their pursuit may lead to undesirable 
consequences (Knight, 2002a; Sadler, 2007). These challenges are, if anything, more acutely 
experienced in the field of WIL (Yorke, 2011). 

Clearly defined learning standards, coupled with 'transparent' assessment processes, are 
intended to improve consistency, thereby reducing the arbitrariness of staff decisions and 
rendering matters open to scrutiny by other interested parties (Boud & Associates, 2010). 
Furthermore, Stowell (2004) suggested that clear specification is needed in order to address 
issues of student equity. As Sadler (2009) has argued, the provision of clear criteria has 
become established 'best practice', to the point where detailed expositions of standards are 
considered to be mandatory in some universities. 
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However, the over-zealous specification of learning standards has been subject to sharp and 
persistent criticism by a number of scholars such as Sadler (2007, 2008, 2009) who pointed to 
the various ways in which students are 'short changed' by tightly defined standards. Despite 
having well-documented outcomes and criteria, "final-year students were not clear about 
goals and standards" in a recent study of 23 degree programs in eight U.K. universities 
(Jessop et al., 2014, p. 82). In fact, standards that are too closely prescribed may even be 
counter-productive to learning. Discussing assessment matters in a workplace context, 
Torrance (2007) reported that the explicit specification of learning standards in the interests 
of clarity and transparency had demonstrably led to narrow and instrumental responses on 
the part of learners. This position is especially unfortunate given that higher education 
aspires to allow for (and encourage) a variety of equally acceptable approaches to a 
particular task, and this diversity is particularly relevant in the case of WIL. 

Knight (2002b, p. 280) in his analysis of assessment practices drew together Eisner's (1985) 
conceptions of 'connoisseurship' with learning theory and psychometrics to assert that in 
higher education, "benchmarks, specifications, criteria and learning outcomes do not and 
cannot make summative assessment reliable, may limit its validity and certainly compound 
its costs". Connoisseurship describes the subjective but skillful judgments made by 
experienced practitioners who have progressively immersed themselves in the discipline 
over time as they progress in a community of academic practice (Lave & Wenger, 1991). A 
similar line is taken by Jawitz (2009), who drew on Bourdieu's conceptions of 'habitus' to 
suggest that staff gradually become more familiar with the tacit criteria and standards 
through social processes of consensus building over time. However, as Sadler (2009) argued, 
holistic judgments based on connoisseurship may be at odds with the grade or mark 
produced by following a closely prescribed set of standards for each of the criteria. On this 
point, Trede and Smith (2014, p. 165) observed that when workplace assessors' judgment- 
based mark did not align with the mark that was obtained from the use of a standard 
assessment form, they "would not act on their judgment but reverted back to dominant 
practices as defined by the material objects they had been given". In other words, the skillful 
judgments of connoisseurship were overruled by the specifications contained in the 
assessment forms. Whilst these issues raise questions for assessment practices more broadly, 
they are far more problematic for WIL, especially given that assessors external to the higher 
education institution are likely to be peripheral to many institutional processes. 

GRADING PRACTICES 

In seminal work conducted by Newstead and Dennis (1994), the reliability of grades 
awarded by highly experienced academics in the field of psychology was shown to be poor, 
even where the task domain and expected learning standards were clearly specified. Since 
then, this has been repeatedly observed in successive studies (Bloxham, 2009; Newstead, 
2002; Orr, 2007; Price et al., 2011). Knight (2006) referred to the local practices of assessment 
to describe how grading decisions are made in a particular context, arguing that they reflect 
both the nature of the assessment task and the circumstances in which the assessor made 
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their judgment. It may be extremely difficult to achieve consistency on grades/marks 
awarded for individual assessment tasks, even where learning standards are 
comprehensively and exhaustively defined. Student performance in a workplace setting, 
which can be a comparatively 'uncontrolled' assessment environment, is impacted by a 
diverse set of variables (Cooper, Orrell, & Bowden, 2010). From a quality assurance point of 
view, these problems become almost intractable in the 'messier' world of WIL. 

The consistency of grading judgments may be improved through processes of moderation, 
including post-hoc analyses of results or more holistic approaches that include a priori 
activities such as consensus building pre-marking meetings (Oliver, Lawson, & Yorke, 2008). 
Post-hoc moderation approaches to the quality assurance of learning standards include 
'double marking' where a second assessor appraises the work, and a variety of strategies 
may be employed such as random sampling, purposive sampling according to pre-set rules, 
or some combination of both. Second marking may also be conducted 'blind' where first 
marker comments are made unavailable to successive markers. However, Bloxham (2009) 
has warned of the 'false promise' of moderation for the purposes of quality assurance, 
pointing to power dynamics in marking and the imperfect way in which disagreements 
between markers are rationalized. Often, she argued, the different marks given by two 
disagreeing markers are simply averaged, an observation borne out in empirical studies 
conducted by Orr (2007). The averaging of differing marks is perhaps a disservice to the 
judgments of both markers. 

Furthermore, many approaches to moderation assume the presence of an assessment 
'artefact' of some sort, such as an essay, reflective report, or a video of practice. In the WIL 
context, such artefacts are not necessarily always readily available, and the level of 
intervention needed to acquire such evidence may serve to detract from the learning 
experience. There are a number of difficulties associated with the moderation of mentors' 
judgments of practice based on a series of observations in a variety of settings over time, for 
example. For these and other reasons, it is perhaps unsurprising that institutions elect to 
retain responsibility for WIL related assessment decisions to the extent they do (Ferns & 
Moore, 2012). 

GRADING IN WIL AND NON-WIL CONTEXTS 

The discussion so far has highlighted a number of problems with learning standards, from 
their definition to their application. Higher education outcomes have historically been 
determined from an aggregation of individual assessment task grades or marks. Assessment 
results can be described in a number of ways, and there are strengths and weaknesses in each 
approach (Yorke, 2008). However, it is not uncommon for the assessment of WIL to be 
treated differently to assessment in other (non-WIL) areas within the university. 

In the Australian university setting, individual assessment task results are commonly 
expressed as a percentage mark. However, the awarding of numerical percentages in the 
WIL setting is less common (Reddan, 2013) where results are more commonly cast as a 
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binary pass/fail, or located in stratified bands of performance using single letter grades (such 
as A, B, C) or descriptive terms (such as 'pass' or 'distinction'). The mathematical summation 
of different task results also poses a problem for the institution and there are widely reported 
statistical weaknesses encountered when individual assessment results are aggregated to 
generate an overall result (Knight, 2002a; Woolf, 2004; Yorke, 2008). 

These issues pose a number of problems for the assessment of WIL, in that differing grading 
practices may instantiate different 'rules of engagement' with students. For example, 
Reddan (2013) reported that some students felt they would tend to put less effort into 
ungraded WIL activities. Furthermore, where WIL tasks are graded, the weighting of 
assessment accorded to WIL may be marginalized. Natoli et al. (2013, p. 80) remarked on the 
low apportioning of marks given to WIL activities, which signaled their lower status in 
comparison to other activities, pointing out that this risked downplaying the "depth and 
significance of any WIL learning [sic]". These are endemic problems of assessment but they 
become heightened in a WIL context. 

IMPLICATIONS FOR THE ASSESSMENT OF WIL 

As we have outlined, the assessment of tasks in the workplace, or close simulations of it, is 
somewhat 'messy'. The assessment of WIL often contains unpredictable aspects, and 
students often have to complete their task with variable, incomplete or inaccurate data. The 
'scaffolding' of WIL experiences can be qualitatively different to that in non-WIL settings 
(see, for example, the discussion by Hodges, Eames, & Coll, 2014). Learning standards in 
relation to WIL are not easy to define, and they are decidedly difficult to apply consistently 
given the inherent variation within the domain. These issues represent a conundrum for 
Australian higher education institutions given the current Government policy focus on the 
development of direct measures of learning, assayed through indicators that have been up 
until now fairly narrowly defined. However, this problem is not new, nor is it unique to 
Australia. This final section summarizes four key issues that seem to be central to the 
assessment of WIL, and briefly sketches out some possibilities for future work. 

Firstly, to an extent, assessment practices in WIL have lagged behind developments in the 
provision of WIL, and there remains an ongoing need for professional development 
opportunities for those assessing in a WIL context. This appears to be just as important for 
those engaged with WIL within the institution as well as those external to it. External 
participants in WIL programs often lack time and resources to engage with assessment 
issues, and this may be more acute for small to medium enterprises. 

Second, the aggregation of student results from WIL and non-WIL assessment tasks should 
be organized to avoid inadvertently downplaying the status of WIL activities. Institutional 
quality and assessment policies need to be sensitive to this risk, and closer articulation of 
these activities may prove fruitful. For example, institutions often develop a separate policy 
to help emphasize new initiatives (see, e.g., Natoli et al, 2013) but more may be gained by 
embedding WIL in existing policies. Some universities have moved, at least in part, away 
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from pass/fail binary judgments of WIL towards the use of grades and found the outcome to 
be positive (Reddan, 2013). However, a parity of esteem will not be adequately established 
until both WIL and non-WIL judgments carry equal status. Given the limitations of 
percentage marking in a number of disciplines there is an emerging argument for 
harmonizing on a common grading format across WIL and non-WIL contexts, although this 
would need to be nuanced to the institutional and disciplinary setting. 

Third, whilst it is difficult to define and apply learning standards with fine precision in many 
areas of higher education, this does not imply that a standards-based approach is 
inappropriate. Broadly consistent judgments of learning standards (and, by extension, 
quality) are possible, but both explicit and tacit knowledge is required if standards are to be 
communicated effectively in any direction within or between students or staff. With respect 
to the assurance of quality, assessor 'calibration' exercises (Sadler, 2012) offer a means of 
demonstrating consistency in the way learning standards are used without recourse to time 
consuming moderation activities. 

Finally, there are significant risks for WIL if quality (at an institutional or national level) is 
construed using narrowly defined performance indicators that do not encompass those 
dimensions that are critical for success, but difficult to measure. The use of narrowly defined 
conceptions of learning may serve to marginalize other aspects such as ethical practice or 
working with others if such aspects fly under the radar of quality performance indicators. 
There may be far reaching implications for equity and diversity in higher education if quality 
and accountability is to be conflated with standardized testing (De Lissovoy & Mclaren, 
2003). 

Sharply defined performance indicators that are subject to public comparison may 
reorientate activities towards supporting those which have impact on these measures (Ball, 
2012), a situation which may have damaging consequences. Appearances of poor 
performance (based on potentially flawed quality indicators) can have far reaching 
implications for institutions. One stark example of this is provided by Salmi (2009), who 
reported that demand for courses evaluated positively by the Brazilian Provao (a national 
mandatory assessment of graduate outcomes in certain fields of study) rose by some 20% 
whilst the demand for courses with a negative Provao assessment reduced by 41%; a high 
stakes test, indeed. 

In short, a considerable body of literature suggests that direct, comparable and reliable 
measures of learning are beyond our current reach. If so, this suggests that quality policies 
ought to focus on improving judgments, shifting the focus more towards the establishment 
of a "controlled reputational range" (as cited in Brown, 2010, p. 3), an approach that is more 
tolerant of diversity. 
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CONCLUDING DISCUSSION 

Massaro (2010, p. 23) suggested that quality policies are increasingly concerned with 
ensuring that education qualifications are globally transportable, "comparable in standard 
from one country to another", and quality assured through "reports to society that are 
comprehensible to it". These potentially competitive aspirations are clearly evident in the 
international and Australian policy agendas to advance economic prosperity through higher 
education outcomes. However, in the Australian context it is currently unclear whether the 
Government will continue to seek direct 'measures' of learning in future iterations of quality 
policy, or whether the softer accountabilities recently proposed by the Higher Education 
Standards Panel will prevail. 

This paper adds to the body of literature arguing that quality should not be exclusively cast 
in terms of a narrowly defined set of indicators (at either national or institutional level). 
Alongside other weaknesses, narrow definitions of quality could be particularly detrimental 
to WIL. There is arguably more to be gained through the development of judgments, 
supported and appraised by processes of peer review, and underpinned by a philosophical 
acceptance that assessment in higher education is somewhat imprecise. 

There are difficult challenges for the assessment of WIL embedded in the issues outlined 
here. In the Australian quality policy context, the discussion may be turning from 
measurement towards judgment, and the removal of narrow performance indicators would 
provide a policy environment that could potentially be sympathetic to the assessment of 
WIL. Admittedly, progress is likely to be neither easy nor swift. However, if it is agreed that 
the practices of WIL form a crucially important part of students' higher education, then much 
more attention needs to be given to assessment in this domain if we are to support and 
realize national policy intentions to develop highly employable graduates. 
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